general description more text
\.*/\1/p' opensource_links.php ``` The `-n` option tells `sed` not to print any lines from the input: ```bash sed -n ' ' opensource_links.php # prints nothing ``` Without the `-n` option `sed` prints every line, and if you requested a substitution, as in ```bash sed 's//foo/' opensource_links.php ``` it will print every line, substituting as directed each first occurrence of `` on a line by `foo`. If you add a `p` flag to the end of the substitution, and put the `-n` option on `sed`, it will only print the lines in which you made a substitution: ```bash sed -n 's//foo/p' opensource_links.php ``` --- name: sed-example2 ## Using `sed` to display the URLS in an HTML file - If we now use as the target pattern `.*` this will match all lines that start with the `` and followed by any text at all, and __it will capture the URL in its variable `\1`__. We use that fact to make `\1` the replacement text and add the `p` option: ``` sed -n 's/\.*/\1/p' opensource_links.php ``` which is the needed command. - And we can easily create a Markdown files by this next modification: ``` sed -n 's/\([^<]*\)<.*/[\2](\1)\n/p' opensource_links.php ``` --- name: cut_paste1 ## Filters: `cut` and `paste` - Both `cut` and `paste` are handy. `cut` can be used to cut lines in specific places, whether at character positions, or by field positions, and can output the cut pieces with different output delimiters. You can specify what delimits the fields. - The default field separator is the TAB character. - You can use `cut` on csv files to pick out columns: - Examples ```bash cut -f1,5 -d: /etc/passwd ``` prints the first and fifth fields of the `/etc/password` file, i.e., the username and "gcos" field. ```bash cut –c1-10 myfile ``` prints only the first 10 characters of each line of `myfile`. --- name: cut_paste2 ## Filters: `cut` and `paste` (2) - `paste` combines lines consisting of the sequentially corresponding lines from different files, separated by TABs, to standard output. - Example: Suppose that there are two files named `cities` and `countries` whose contents are as shown below (in two columns to save space.) .left-column2[ ```bash $ cat cities Rome Paris London Dublin Tokyo ``` ] .right-column2[ ```bash $ cat countries Italy France England Ireland Japan ``` ] .below-column2[ Then the `paste` command will "merge" the files onto standard output, separating the words with tabs: ```bash $ paste cities countries Rome Italy Paris France London England Dublin Ireland Tokyo Japan ``` ] --- name: cut_paste3 ## Filters: `cut` and `paste` (3) - We can feed standard input into the `paste` command simultaneously. A hyphen '-' represents standard input when it is used in place of a filename argument, as in ```bash paste file - ``` We can use this idea to add numbers to the lines in our previous example, e.g. line 1, 2,3, 4, and 5, as follows: ```bash $seq 1 5 | paste - cities countries 1 Rome Italy 2 Paris France 3 London England 4 Dublin Ireland 5 Tokyo Japan ``` - See what happens when you try ```bash $seq 1 30 | paste - - - ``` Pretty interesting? --- name: awk1 ## `awk` - `awk` is not just an extremely powerful filter; _it is a programming language_. The `awk` filter implements the AWK programming language, named after its authors, __A__ho, __K__ernighan, and __W__einberger. - You give `awk` a program and files on which the program is run, and `awk` runs the program on one file after another. The program is either enclosed in single quotes on the command line, or passed to `awk` in a file like so: ```bash awk [AWK-OPTIONS] -f program-file file ... ``` - If the program is enclosed in single quotes on the command-line, then you run `awk` like this: ```bash awk [AWK-OPTIONS] 'program-text' file ... ``` - Unlike the other filters, `awk` treats each input line as a sequence of fields, delimited by an __input field separator__, by default any amount of whitespace. Fields are named `$1`,`$2`, `$3`, and so on. $0 is the entire line. - If there is no file argument, `awk` reads from standard input. - There is a lot to learn about `awk` - these slides contain just some simple examples. --- name: awk2 ## `awk` (2) - Every `awk` program is a sequence of __pattern-action__ instructions or function definitions*. .footnote[ * There are other kinds of statements as well, not covered here. ] - A pattern-action instruction is of the form ```bash pattern {action} ``` where the pattern can be any of - `BEGIN` - `END` - a regular expression - a comparison - empty and the action is an instruction in a mostly C-like syntax. - Example ```bash awk ' $1 == "reboot" {print $2; }' file1 file2 file3 ``` which prints the second field in any line whose first field is the string "reboot" from the input files `file1`, `file2`, and `file3`. --- name: awk3 ## `awk` (3) - The input field separator can be a character or a regular expression. The command-line option `-F` sets it; use single quotes to protect regular expressions from the shell: ```bash awk -F: '{print $3}' /etc/passwd ``` which separates fields with the colon ":", and ```bash awk -F'aa*' '{print $3}' file1 ``` which separates fields with one or more `a`'s. - Simples uses of `awk` are to print lines with fields that meet a condition, or to print the fields in a different order. `awk` has both a simple `print` instruction as well as all forms of the `C` `printf`: ```bash awk -F, ' {print $1, $2}' names.csv ``` prints the first two fields of every line with a space between them, whereas ```bash awk -F, '{printf "%s\t%s\n", $1, $2}' names.csv ``` prints the first two fields with a TAB between them and a newline after. --- name: awk4 ## `awk` (4) - The __`BEGIN`__ pattern causes `awk` to execute the associated action __before__ reading its input. The __`END`__ pattern's action is executed __after__ all input has been read. The following adds the values of field $1 on all lines and prints their sum. ```bash awk ' BEGIN { sum = 0 } { sum += $1 } END { print sum }' ``` - `awk` has variables that do not need declarations, and - `awk` has operators just like those in C. - The `awk` built-in variable __`NR`__ is the total number of records (lines) read so far. __`NF`__ is the number of fields on the current line, so this command prints the average of the field $1 values: ```bash awk ' BEGIN { sum = 0 } { sum += $1 } END { if ( NR > 0 ) { print sum/NR } }' ``` - This ```bash awk ' { printf "Line %d has %d fields.\n", NR, NF }' ``` displays for each input line, a line of the form ```bash Line 1 has 10 fields. ... ``` --- name: awk5 ## `awk` (5) - The variable `$NF` is the last field on a line, so ```bash awk '{ print $NF }' ``` prints the last field on every input line. - The variable `FNR` is the input record number __in the current input file__. We can print the headings of a CSV file using the following `awk` script: ```bash cat somecsvfile.csv | awk -F, ' \ { if (FNR==1) \ for (i=1; i<=NF; i++) \ printf "Field %d:\t%s\n",i,$i \ }' ``` The backslashes are needed to prevent the shell from treating each line as a separate command. Notice that `$i` is used to iterate through `$1`, `$2`,... `$NF`. --- name: awk6 ## `awk` (6) - You can use extended regular expression matching in the pattern: ```bash awk -F, ' BEGIN {sum = 0} $1 ~ /[AB].*/ { sum += $3 } END {print sum}' names.csv ``` which adds the values from field $3 for all input lines whose first field starts with an "A" or a "B" from the csv file `names.csv`. All extended regular expressions can be used in `awk`. They must be enclosed in `/ /` brackets. - Logical expressions can be patterns. This `awk` script finds the smallest unused user id in the password file to assign to a new user: ```bash ypcat passwd | \ awk 'BEGIN {FS = ":" ; MAX = 0 } ($3 > MAX ) {MAX = $3} \ END {printf " %s\n", MAX+1} ' ``` The baskslashes are used at the ends of lines that are part of the command. And `FS` is the field separator variable. This is another way to dynamically change it. --- name: awk7 ## `awk` (7) - `awk` has the following control flow statements: - if (condition) statement [ else statement ] - while (condition) statement - do statement while (condition) - for (expr1; expr2; expr3) statement - for (var in array) statement - break - continue - delete array[index] - delete array - exit [ expression ] - { statements } - a switch statement like C's. - `awk` also has many built-in functions, including numeric functions, string functions, time functions, bit manipulationm and more. - It has array variables as well. - There is much more to `awk` than can be described in a few slides. The man page is a good place to look for a comprehensive description of what it can do. --- name: more_filters ## Filters yet to come: - Some filters have not yet made it into these slides. The most important of these are - sed, shuf, split, tr Of these, `sed` is the most powerful, and the hardest to master. The others have a shallow learning curve, and you can read the manpages for them to figure out how to use them. --- name: links ## Useful Links A list of some relevant links - [List of Common Linux Commands](https://ss64.com/bash/) - [Linux Scripting Tutorial](https://bash.cyberciti.biz/guide/Main_Page) - [GNU bash Manual](https://www.gnu.org/software/bash/manual/bash.html) - [Introduction to Text Manipulation in Unix](https://developer.ibm.com/articles/au-unixtext/#25.Resources|outline) --- name: exercise1 ## Exercises 1 Start with some easier ones first. In the exercises, the word _command_ means a structured command - you might need to use pipes or even nested commands. 1. Look at the man page for the `shuf` command. Then write a command that generates a permutation of the integers from 1 to 100 in a file named `permutation100`. 1. Write a command that puts the absolute path names to all C or C++ files in your home directory or below it in a file named `mysourcecode` in your home directory. These files also include header files with a suffix of `.h`. 1. Write a command to display the first 10 lines of every file in the current working directory.For simplicity, assume that the directory contains only plain text files. --- name: exercise2 ## Exercises 2 1. Write a command that prints the number of times that the word "lie" occurs in the set of all files in the current directory with a `.html extension`. Make it case insensitive. 1. This one is not hard after you do a bit of rummaging through the man page for `grep`. Write a command that will print every line of a file preceded by its line number followed by a colon. For example, if the file has two lines ```bash All that glitters is not gold. ``` then it will display ```bash 1: All that glitters 2: is not gold. ``` --- name: exercise3 ## Exercises 3 1. The `ps` command displays process status information for a set of processes running on the local machine. With the `-ef` flags, `ps` lists the status of every process. Look at its output and then write a __script__ named `showprocs` that, when given a user's name, prints the number of processes currently running on that user's behalf. Although the slides so far have not shown the form of a script, you can model it from the following: ```bash #!/bin/bash # Print the first command line argument and exit echo $1 $1 $1 ``` The first line is required in its exact form. The remaining lines that start with `#` are comment lines; you put them there as documentation. The `$1` is a shell variable that stores the first word on the command line after the command itself. `$1` is replaced by the word typed after the script's name when the command is executed. For example if the above script is in a file named `echo3` then we would see the following: ``` $ echo3 hello hello hello hello ``` if we first make the script executable by typing `chmod +x echo3`. --- name: exercise4 ## Exercises 4 1. The `history` command in bash displays the commands you have run recently. The file `~/.bash_history` stores by default the last 500 commands. Inspect that file and then write a command that displays the ten commands you have used the most recently. 1. Before changing a file, it is sometimes safe to make a copy of it and tag the copy with today's date, as for example, by changing `README.md` to `README.md.2019.11.10`. Write a script named `cpdated` that could be used to make a copy with the current date appended to the name, so that ```bash cpdated myfile ``` would create `myfile.2019.11.10` if run on November 10, 2019. Don't worry about error checking such as whether the file exists or whether you have permission to create a file in the current directory. --- name: exercise5 ## Exercises 5 1. Suppose that you want to drop down the headings by one level each in a set of markdown files whose names end in a `.md` extension, in a directory named `documents`. For example, you want a heading starting with `#` to become a `##` heading, and a `##` heading to become a `###` heading. But you do not want level 4 headings to change, so `####` stays as `####`. You are not sure whether there are spaces after the heading tag before the actual text. You can do any of the following: 1. Open a graphical editor and use its __find/replace__ feature on each and every file. 1. Open a command-line editor like `vi` or `vim` and use its find functionality to find every occurrence and change it. 1. Use `vi` or `vim` to do a __global substitution__. 1. Use a well-designed `sed` command to do all of the replacements in a single shot. The last alternative is clearly the best use of your time, and you can ask `sed` to make a backup in case you are nervous about ruining your files. What is the `vi` substitution that will do this? What is the `sed` command that can do this? --- name: exercise6 ## Exercises 6 1. Suppose that for stylistic reasons, you need to replace every C-style single-line comment in your C++ file by C++ style comments. For example, you need to replace ```bash /* The following code finds the min element in the array */ ``` by ```bash // The following code finds the min element in the array ``` regardless of whether there is code to the left of the comment. You can do any of the following: 1. Open a graphical editor and use its __find/replace__ feature. 1. Open a command-line editor like `vi` or `vim` and use its find functionality to find every occurrence and change it. 1. Use `vi` or `vim` to do a __global substitution__. 1. Use a well-designed `sed` command to do all of the replacements in a single shot. The last alternative is clearly the best use of your time, and you can ask `sed` to make a backup in case you are nervous about ruining your files. What is the `vi` substitution that will do this? What is the `sed` command that can do this? --- name: exercise7 ## Exercises 7 1. (HARD) There is a command called `cal` that displays a calendar in Linux: ```bash $ cal April 2019 Su Mo Tu We Th Fr Sa 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 ``` Using only `echo` and `paste`, try to display the calendar in a similar format for the current month. Hint: if you enclose a sequence of commands in parentheses, they are treated as a single command, e.g.: ```bash $ (echo hello; echo goodbye) | wc 2 2 14 $ echo hello; echo goodbye | wc hello 1 1 8 ```
/foo/' opensource_links.php ``` it will print every line, substituting as directed each first occurrence of `
` on a line by `foo`. If you add a `p` flag to the end of the substitution, and put the `-n` option on `sed`, it will only print the lines in which you made a substitution: ```bash sed -n 's/
/foo/p' opensource_links.php ``` --- name: sed-example2 ## Using `sed` to display the URLS in an HTML file - If we now use as the target pattern `
.*` this will match all lines that start with the `` and followed by any text at all, and __it will capture the URL in its variable `\1`__. We use that fact to make `\1` the replacement text and add the `p` option: ``` sed -n 's/\.*/\1/p' opensource_links.php ``` which is the needed command. - And we can easily create a Markdown files by this next modification: ``` sed -n 's/\([^<]*\)<.*/[\2](\1)\n/p' opensource_links.php ``` --- name: cut_paste1 ## Filters: `cut` and `paste` - Both `cut` and `paste` are handy. `cut` can be used to cut lines in specific places, whether at character positions, or by field positions, and can output the cut pieces with different output delimiters. You can specify what delimits the fields. - The default field separator is the TAB character. - You can use `cut` on csv files to pick out columns: - Examples ```bash cut -f1,5 -d: /etc/passwd ``` prints the first and fifth fields of the `/etc/password` file, i.e., the username and "gcos" field. ```bash cut –c1-10 myfile ``` prints only the first 10 characters of each line of `myfile`. --- name: cut_paste2 ## Filters: `cut` and `paste` (2) - `paste` combines lines consisting of the sequentially corresponding lines from different files, separated by TABs, to standard output. - Example: Suppose that there are two files named `cities` and `countries` whose contents are as shown below (in two columns to save space.) .left-column2[ ```bash $ cat cities Rome Paris London Dublin Tokyo ``` ] .right-column2[ ```bash $ cat countries Italy France England Ireland Japan ``` ] .below-column2[ Then the `paste` command will "merge" the files onto standard output, separating the words with tabs: ```bash $ paste cities countries Rome Italy Paris France London England Dublin Ireland Tokyo Japan ``` ] --- name: cut_paste3 ## Filters: `cut` and `paste` (3) - We can feed standard input into the `paste` command simultaneously. A hyphen '-' represents standard input when it is used in place of a filename argument, as in ```bash paste file - ``` We can use this idea to add numbers to the lines in our previous example, e.g. line 1, 2,3, 4, and 5, as follows: ```bash $seq 1 5 | paste - cities countries 1 Rome Italy 2 Paris France 3 London England 4 Dublin Ireland 5 Tokyo Japan ``` - See what happens when you try ```bash $seq 1 30 | paste - - - ``` Pretty interesting? --- name: awk1 ## `awk` - `awk` is not just an extremely powerful filter; _it is a programming language_. The `awk` filter implements the AWK programming language, named after its authors, __A__ho, __K__ernighan, and __W__einberger. - You give `awk` a program and files on which the program is run, and `awk` runs the program on one file after another. The program is either enclosed in single quotes on the command line, or passed to `awk` in a file like so: ```bash awk [AWK-OPTIONS] -f program-file file ... ``` - If the program is enclosed in single quotes on the command-line, then you run `awk` like this: ```bash awk [AWK-OPTIONS] 'program-text' file ... ``` - Unlike the other filters, `awk` treats each input line as a sequence of fields, delimited by an __input field separator__, by default any amount of whitespace. Fields are named `$1`,`$2`, `$3`, and so on. $0 is the entire line. - If there is no file argument, `awk` reads from standard input. - There is a lot to learn about `awk` - these slides contain just some simple examples. --- name: awk2 ## `awk` (2) - Every `awk` program is a sequence of __pattern-action__ instructions or function definitions*. .footnote[ * There are other kinds of statements as well, not covered here. ] - A pattern-action instruction is of the form ```bash pattern {action} ``` where the pattern can be any of - `BEGIN` - `END` - a regular expression - a comparison - empty and the action is an instruction in a mostly C-like syntax. - Example ```bash awk ' $1 == "reboot" {print $2; }' file1 file2 file3 ``` which prints the second field in any line whose first field is the string "reboot" from the input files `file1`, `file2`, and `file3`. --- name: awk3 ## `awk` (3) - The input field separator can be a character or a regular expression. The command-line option `-F` sets it; use single quotes to protect regular expressions from the shell: ```bash awk -F: '{print $3}' /etc/passwd ``` which separates fields with the colon ":", and ```bash awk -F'aa*' '{print $3}' file1 ``` which separates fields with one or more `a`'s. - Simples uses of `awk` are to print lines with fields that meet a condition, or to print the fields in a different order. `awk` has both a simple `print` instruction as well as all forms of the `C` `printf`: ```bash awk -F, ' {print $1, $2}' names.csv ``` prints the first two fields of every line with a space between them, whereas ```bash awk -F, '{printf "%s\t%s\n", $1, $2}' names.csv ``` prints the first two fields with a TAB between them and a newline after. --- name: awk4 ## `awk` (4) - The __`BEGIN`__ pattern causes `awk` to execute the associated action __before__ reading its input. The __`END`__ pattern's action is executed __after__ all input has been read. The following adds the values of field $1 on all lines and prints their sum. ```bash awk ' BEGIN { sum = 0 } { sum += $1 } END { print sum }' ``` - `awk` has variables that do not need declarations, and - `awk` has operators just like those in C. - The `awk` built-in variable __`NR`__ is the total number of records (lines) read so far. __`NF`__ is the number of fields on the current line, so this command prints the average of the field $1 values: ```bash awk ' BEGIN { sum = 0 } { sum += $1 } END { if ( NR > 0 ) { print sum/NR } }' ``` - This ```bash awk ' { printf "Line %d has %d fields.\n", NR, NF }' ``` displays for each input line, a line of the form ```bash Line 1 has 10 fields. ... ``` --- name: awk5 ## `awk` (5) - The variable `$NF` is the last field on a line, so ```bash awk '{ print $NF }' ``` prints the last field on every input line. - The variable `FNR` is the input record number __in the current input file__. We can print the headings of a CSV file using the following `awk` script: ```bash cat somecsvfile.csv | awk -F, ' \ { if (FNR==1) \ for (i=1; i<=NF; i++) \ printf "Field %d:\t%s\n",i,$i \ }' ``` The backslashes are needed to prevent the shell from treating each line as a separate command. Notice that `$i` is used to iterate through `$1`, `$2`,... `$NF`. --- name: awk6 ## `awk` (6) - You can use extended regular expression matching in the pattern: ```bash awk -F, ' BEGIN {sum = 0} $1 ~ /[AB].*/ { sum += $3 } END {print sum}' names.csv ``` which adds the values from field $3 for all input lines whose first field starts with an "A" or a "B" from the csv file `names.csv`. All extended regular expressions can be used in `awk`. They must be enclosed in `/ /` brackets. - Logical expressions can be patterns. This `awk` script finds the smallest unused user id in the password file to assign to a new user: ```bash ypcat passwd | \ awk 'BEGIN {FS = ":" ; MAX = 0 } ($3 > MAX ) {MAX = $3} \ END {printf " %s\n", MAX+1} ' ``` The baskslashes are used at the ends of lines that are part of the command. And `FS` is the field separator variable. This is another way to dynamically change it. --- name: awk7 ## `awk` (7) - `awk` has the following control flow statements: - if (condition) statement [ else statement ] - while (condition) statement - do statement while (condition) - for (expr1; expr2; expr3) statement - for (var in array) statement - break - continue - delete array[index] - delete array - exit [ expression ] - { statements } - a switch statement like C's. - `awk` also has many built-in functions, including numeric functions, string functions, time functions, bit manipulationm and more. - It has array variables as well. - There is much more to `awk` than can be described in a few slides. The man page is a good place to look for a comprehensive description of what it can do. --- name: more_filters ## Filters yet to come: - Some filters have not yet made it into these slides. The most important of these are - sed, shuf, split, tr Of these, `sed` is the most powerful, and the hardest to master. The others have a shallow learning curve, and you can read the manpages for them to figure out how to use them. --- name: links ## Useful Links A list of some relevant links - [List of Common Linux Commands](https://ss64.com/bash/) - [Linux Scripting Tutorial](https://bash.cyberciti.biz/guide/Main_Page) - [GNU bash Manual](https://www.gnu.org/software/bash/manual/bash.html) - [Introduction to Text Manipulation in Unix](https://developer.ibm.com/articles/au-unixtext/#25.Resources|outline) --- name: exercise1 ## Exercises 1 Start with some easier ones first. In the exercises, the word _command_ means a structured command - you might need to use pipes or even nested commands. 1. Look at the man page for the `shuf` command. Then write a command that generates a permutation of the integers from 1 to 100 in a file named `permutation100`. 1. Write a command that puts the absolute path names to all C or C++ files in your home directory or below it in a file named `mysourcecode` in your home directory. These files also include header files with a suffix of `.h`. 1. Write a command to display the first 10 lines of every file in the current working directory.For simplicity, assume that the directory contains only plain text files. --- name: exercise2 ## Exercises 2 1. Write a command that prints the number of times that the word "lie" occurs in the set of all files in the current directory with a `.html extension`. Make it case insensitive. 1. This one is not hard after you do a bit of rummaging through the man page for `grep`. Write a command that will print every line of a file preceded by its line number followed by a colon. For example, if the file has two lines ```bash All that glitters is not gold. ``` then it will display ```bash 1: All that glitters 2: is not gold. ``` --- name: exercise3 ## Exercises 3 1. The `ps` command displays process status information for a set of processes running on the local machine. With the `-ef` flags, `ps` lists the status of every process. Look at its output and then write a __script__ named `showprocs` that, when given a user's name, prints the number of processes currently running on that user's behalf. Although the slides so far have not shown the form of a script, you can model it from the following: ```bash #!/bin/bash # Print the first command line argument and exit echo $1 $1 $1 ``` The first line is required in its exact form. The remaining lines that start with `#` are comment lines; you put them there as documentation. The `$1` is a shell variable that stores the first word on the command line after the command itself. `$1` is replaced by the word typed after the script's name when the command is executed. For example if the above script is in a file named `echo3` then we would see the following: ``` $ echo3 hello hello hello hello ``` if we first make the script executable by typing `chmod +x echo3`. --- name: exercise4 ## Exercises 4 1. The `history` command in bash displays the commands you have run recently. The file `~/.bash_history` stores by default the last 500 commands. Inspect that file and then write a command that displays the ten commands you have used the most recently. 1. Before changing a file, it is sometimes safe to make a copy of it and tag the copy with today's date, as for example, by changing `README.md` to `README.md.2019.11.10`. Write a script named `cpdated` that could be used to make a copy with the current date appended to the name, so that ```bash cpdated myfile ``` would create `myfile.2019.11.10` if run on November 10, 2019. Don't worry about error checking such as whether the file exists or whether you have permission to create a file in the current directory. --- name: exercise5 ## Exercises 5 1. Suppose that you want to drop down the headings by one level each in a set of markdown files whose names end in a `.md` extension, in a directory named `documents`. For example, you want a heading starting with `#` to become a `##` heading, and a `##` heading to become a `###` heading. But you do not want level 4 headings to change, so `####` stays as `####`. You are not sure whether there are spaces after the heading tag before the actual text. You can do any of the following: 1. Open a graphical editor and use its __find/replace__ feature on each and every file. 1. Open a command-line editor like `vi` or `vim` and use its find functionality to find every occurrence and change it. 1. Use `vi` or `vim` to do a __global substitution__. 1. Use a well-designed `sed` command to do all of the replacements in a single shot. The last alternative is clearly the best use of your time, and you can ask `sed` to make a backup in case you are nervous about ruining your files. What is the `vi` substitution that will do this? What is the `sed` command that can do this? --- name: exercise6 ## Exercises 6 1. Suppose that for stylistic reasons, you need to replace every C-style single-line comment in your C++ file by C++ style comments. For example, you need to replace ```bash /* The following code finds the min element in the array */ ``` by ```bash // The following code finds the min element in the array ``` regardless of whether there is code to the left of the comment. You can do any of the following: 1. Open a graphical editor and use its __find/replace__ feature. 1. Open a command-line editor like `vi` or `vim` and use its find functionality to find every occurrence and change it. 1. Use `vi` or `vim` to do a __global substitution__. 1. Use a well-designed `sed` command to do all of the replacements in a single shot. The last alternative is clearly the best use of your time, and you can ask `sed` to make a backup in case you are nervous about ruining your files. What is the `vi` substitution that will do this? What is the `sed` command that can do this? --- name: exercise7 ## Exercises 7 1. (HARD) There is a command called `cal` that displays a calendar in Linux: ```bash $ cal April 2019 Su Mo Tu We Th Fr Sa 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 ``` Using only `echo` and `paste`, try to display the calendar in a similar format for the current month. Hint: if you enclose a sequence of commands in parentheses, they are treated as a single command, e.g.: ```bash $ (echo hello; echo goodbye) | wc 2 2 14 $ echo hello; echo goodbye | wc hello 1 1 8 ```
` and followed by any text at all, and __it will capture the URL in its variable `\1`__. We use that fact to make `\1` the replacement text and add the `p` option: ``` sed -n 's/\.*/\1/p' opensource_links.php ``` which is the needed command. - And we can easily create a Markdown files by this next modification: ``` sed -n 's/\([^<]*\)<.*/[\2](\1)\n/p' opensource_links.php ``` --- name: cut_paste1 ## Filters: `cut` and `paste` - Both `cut` and `paste` are handy. `cut` can be used to cut lines in specific places, whether at character positions, or by field positions, and can output the cut pieces with different output delimiters. You can specify what delimits the fields. - The default field separator is the TAB character. - You can use `cut` on csv files to pick out columns: - Examples ```bash cut -f1,5 -d: /etc/passwd ``` prints the first and fifth fields of the `/etc/password` file, i.e., the username and "gcos" field. ```bash cut –c1-10 myfile ``` prints only the first 10 characters of each line of `myfile`. --- name: cut_paste2 ## Filters: `cut` and `paste` (2) - `paste` combines lines consisting of the sequentially corresponding lines from different files, separated by TABs, to standard output. - Example: Suppose that there are two files named `cities` and `countries` whose contents are as shown below (in two columns to save space.) .left-column2[ ```bash $ cat cities Rome Paris London Dublin Tokyo ``` ] .right-column2[ ```bash $ cat countries Italy France England Ireland Japan ``` ] .below-column2[ Then the `paste` command will "merge" the files onto standard output, separating the words with tabs: ```bash $ paste cities countries Rome Italy Paris France London England Dublin Ireland Tokyo Japan ``` ] --- name: cut_paste3 ## Filters: `cut` and `paste` (3) - We can feed standard input into the `paste` command simultaneously. A hyphen '-' represents standard input when it is used in place of a filename argument, as in ```bash paste file - ``` We can use this idea to add numbers to the lines in our previous example, e.g. line 1, 2,3, 4, and 5, as follows: ```bash $seq 1 5 | paste - cities countries 1 Rome Italy 2 Paris France 3 London England 4 Dublin Ireland 5 Tokyo Japan ``` - See what happens when you try ```bash $seq 1 30 | paste - - - ``` Pretty interesting? --- name: awk1 ## `awk` - `awk` is not just an extremely powerful filter; _it is a programming language_. The `awk` filter implements the AWK programming language, named after its authors, __A__ho, __K__ernighan, and __W__einberger. - You give `awk` a program and files on which the program is run, and `awk` runs the program on one file after another. The program is either enclosed in single quotes on the command line, or passed to `awk` in a file like so: ```bash awk [AWK-OPTIONS] -f program-file file ... ``` - If the program is enclosed in single quotes on the command-line, then you run `awk` like this: ```bash awk [AWK-OPTIONS] 'program-text' file ... ``` - Unlike the other filters, `awk` treats each input line as a sequence of fields, delimited by an __input field separator__, by default any amount of whitespace. Fields are named `$1`,`$2`, `$3`, and so on. $0 is the entire line. - If there is no file argument, `awk` reads from standard input. - There is a lot to learn about `awk` - these slides contain just some simple examples. --- name: awk2 ## `awk` (2) - Every `awk` program is a sequence of __pattern-action__ instructions or function definitions*. .footnote[ * There are other kinds of statements as well, not covered here. ] - A pattern-action instruction is of the form ```bash pattern {action} ``` where the pattern can be any of - `BEGIN` - `END` - a regular expression - a comparison - empty and the action is an instruction in a mostly C-like syntax. - Example ```bash awk ' $1 == "reboot" {print $2; }' file1 file2 file3 ``` which prints the second field in any line whose first field is the string "reboot" from the input files `file1`, `file2`, and `file3`. --- name: awk3 ## `awk` (3) - The input field separator can be a character or a regular expression. The command-line option `-F` sets it; use single quotes to protect regular expressions from the shell: ```bash awk -F: '{print $3}' /etc/passwd ``` which separates fields with the colon ":", and ```bash awk -F'aa*' '{print $3}' file1 ``` which separates fields with one or more `a`'s. - Simples uses of `awk` are to print lines with fields that meet a condition, or to print the fields in a different order. `awk` has both a simple `print` instruction as well as all forms of the `C` `printf`: ```bash awk -F, ' {print $1, $2}' names.csv ``` prints the first two fields of every line with a space between them, whereas ```bash awk -F, '{printf "%s\t%s\n", $1, $2}' names.csv ``` prints the first two fields with a TAB between them and a newline after. --- name: awk4 ## `awk` (4) - The __`BEGIN`__ pattern causes `awk` to execute the associated action __before__ reading its input. The __`END`__ pattern's action is executed __after__ all input has been read. The following adds the values of field $1 on all lines and prints their sum. ```bash awk ' BEGIN { sum = 0 } { sum += $1 } END { print sum }' ``` - `awk` has variables that do not need declarations, and - `awk` has operators just like those in C. - The `awk` built-in variable __`NR`__ is the total number of records (lines) read so far. __`NF`__ is the number of fields on the current line, so this command prints the average of the field $1 values: ```bash awk ' BEGIN { sum = 0 } { sum += $1 } END { if ( NR > 0 ) { print sum/NR } }' ``` - This ```bash awk ' { printf "Line %d has %d fields.\n", NR, NF }' ``` displays for each input line, a line of the form ```bash Line 1 has 10 fields. ... ``` --- name: awk5 ## `awk` (5) - The variable `$NF` is the last field on a line, so ```bash awk '{ print $NF }' ``` prints the last field on every input line. - The variable `FNR` is the input record number __in the current input file__. We can print the headings of a CSV file using the following `awk` script: ```bash cat somecsvfile.csv | awk -F, ' \ { if (FNR==1) \ for (i=1; i<=NF; i++) \ printf "Field %d:\t%s\n",i,$i \ }' ``` The backslashes are needed to prevent the shell from treating each line as a separate command. Notice that `$i` is used to iterate through `$1`, `$2`,... `$NF`. --- name: awk6 ## `awk` (6) - You can use extended regular expression matching in the pattern: ```bash awk -F, ' BEGIN {sum = 0} $1 ~ /[AB].*/ { sum += $3 } END {print sum}' names.csv ``` which adds the values from field $3 for all input lines whose first field starts with an "A" or a "B" from the csv file `names.csv`. All extended regular expressions can be used in `awk`. They must be enclosed in `/ /` brackets. - Logical expressions can be patterns. This `awk` script finds the smallest unused user id in the password file to assign to a new user: ```bash ypcat passwd | \ awk 'BEGIN {FS = ":" ; MAX = 0 } ($3 > MAX ) {MAX = $3} \ END {printf " %s\n", MAX+1} ' ``` The baskslashes are used at the ends of lines that are part of the command. And `FS` is the field separator variable. This is another way to dynamically change it. --- name: awk7 ## `awk` (7) - `awk` has the following control flow statements: - if (condition) statement [ else statement ] - while (condition) statement - do statement while (condition) - for (expr1; expr2; expr3) statement - for (var in array) statement - break - continue - delete array[index] - delete array - exit [ expression ] - { statements } - a switch statement like C's. - `awk` also has many built-in functions, including numeric functions, string functions, time functions, bit manipulationm and more. - It has array variables as well. - There is much more to `awk` than can be described in a few slides. The man page is a good place to look for a comprehensive description of what it can do. --- name: more_filters ## Filters yet to come: - Some filters have not yet made it into these slides. The most important of these are - sed, shuf, split, tr Of these, `sed` is the most powerful, and the hardest to master. The others have a shallow learning curve, and you can read the manpages for them to figure out how to use them. --- name: links ## Useful Links A list of some relevant links - [List of Common Linux Commands](https://ss64.com/bash/) - [Linux Scripting Tutorial](https://bash.cyberciti.biz/guide/Main_Page) - [GNU bash Manual](https://www.gnu.org/software/bash/manual/bash.html) - [Introduction to Text Manipulation in Unix](https://developer.ibm.com/articles/au-unixtext/#25.Resources|outline) --- name: exercise1 ## Exercises 1 Start with some easier ones first. In the exercises, the word _command_ means a structured command - you might need to use pipes or even nested commands. 1. Look at the man page for the `shuf` command. Then write a command that generates a permutation of the integers from 1 to 100 in a file named `permutation100`. 1. Write a command that puts the absolute path names to all C or C++ files in your home directory or below it in a file named `mysourcecode` in your home directory. These files also include header files with a suffix of `.h`. 1. Write a command to display the first 10 lines of every file in the current working directory.For simplicity, assume that the directory contains only plain text files. --- name: exercise2 ## Exercises 2 1. Write a command that prints the number of times that the word "lie" occurs in the set of all files in the current directory with a `.html extension`. Make it case insensitive. 1. This one is not hard after you do a bit of rummaging through the man page for `grep`. Write a command that will print every line of a file preceded by its line number followed by a colon. For example, if the file has two lines ```bash All that glitters is not gold. ``` then it will display ```bash 1: All that glitters 2: is not gold. ``` --- name: exercise3 ## Exercises 3 1. The `ps` command displays process status information for a set of processes running on the local machine. With the `-ef` flags, `ps` lists the status of every process. Look at its output and then write a __script__ named `showprocs` that, when given a user's name, prints the number of processes currently running on that user's behalf. Although the slides so far have not shown the form of a script, you can model it from the following: ```bash #!/bin/bash # Print the first command line argument and exit echo $1 $1 $1 ``` The first line is required in its exact form. The remaining lines that start with `#` are comment lines; you put them there as documentation. The `$1` is a shell variable that stores the first word on the command line after the command itself. `$1` is replaced by the word typed after the script's name when the command is executed. For example if the above script is in a file named `echo3` then we would see the following: ``` $ echo3 hello hello hello hello ``` if we first make the script executable by typing `chmod +x echo3`. --- name: exercise4 ## Exercises 4 1. The `history` command in bash displays the commands you have run recently. The file `~/.bash_history` stores by default the last 500 commands. Inspect that file and then write a command that displays the ten commands you have used the most recently. 1. Before changing a file, it is sometimes safe to make a copy of it and tag the copy with today's date, as for example, by changing `README.md` to `README.md.2019.11.10`. Write a script named `cpdated` that could be used to make a copy with the current date appended to the name, so that ```bash cpdated myfile ``` would create `myfile.2019.11.10` if run on November 10, 2019. Don't worry about error checking such as whether the file exists or whether you have permission to create a file in the current directory. --- name: exercise5 ## Exercises 5 1. Suppose that you want to drop down the headings by one level each in a set of markdown files whose names end in a `.md` extension, in a directory named `documents`. For example, you want a heading starting with `#` to become a `##` heading, and a `##` heading to become a `###` heading. But you do not want level 4 headings to change, so `####` stays as `####`. You are not sure whether there are spaces after the heading tag before the actual text. You can do any of the following: 1. Open a graphical editor and use its __find/replace__ feature on each and every file. 1. Open a command-line editor like `vi` or `vim` and use its find functionality to find every occurrence and change it. 1. Use `vi` or `vim` to do a __global substitution__. 1. Use a well-designed `sed` command to do all of the replacements in a single shot. The last alternative is clearly the best use of your time, and you can ask `sed` to make a backup in case you are nervous about ruining your files. What is the `vi` substitution that will do this? What is the `sed` command that can do this? --- name: exercise6 ## Exercises 6 1. Suppose that for stylistic reasons, you need to replace every C-style single-line comment in your C++ file by C++ style comments. For example, you need to replace ```bash /* The following code finds the min element in the array */ ``` by ```bash // The following code finds the min element in the array ``` regardless of whether there is code to the left of the comment. You can do any of the following: 1. Open a graphical editor and use its __find/replace__ feature. 1. Open a command-line editor like `vi` or `vim` and use its find functionality to find every occurrence and change it. 1. Use `vi` or `vim` to do a __global substitution__. 1. Use a well-designed `sed` command to do all of the replacements in a single shot. The last alternative is clearly the best use of your time, and you can ask `sed` to make a backup in case you are nervous about ruining your files. What is the `vi` substitution that will do this? What is the `sed` command that can do this? --- name: exercise7 ## Exercises 7 1. (HARD) There is a command called `cal` that displays a calendar in Linux: ```bash $ cal April 2019 Su Mo Tu We Th Fr Sa 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 ``` Using only `echo` and `paste`, try to display the calendar in a similar format for the current month. Hint: if you enclose a sequence of commands in parentheses, they are treated as a single command, e.g.: ```bash $ (echo hello; echo goodbye) | wc 2 2 14 $ echo hello; echo goodbye | wc hello 1 1 8 ```
\.*/\1/p' opensource_links.php ``` which is the needed command. - And we can easily create a Markdown files by this next modification: ``` sed -n 's/\([^<]*\)<.*/[\2](\1)\n/p' opensource_links.php ``` --- name: cut_paste1 ## Filters: `cut` and `paste` - Both `cut` and `paste` are handy. `cut` can be used to cut lines in specific places, whether at character positions, or by field positions, and can output the cut pieces with different output delimiters. You can specify what delimits the fields. - The default field separator is the TAB character. - You can use `cut` on csv files to pick out columns: - Examples ```bash cut -f1,5 -d: /etc/passwd ``` prints the first and fifth fields of the `/etc/password` file, i.e., the username and "gcos" field. ```bash cut –c1-10 myfile ``` prints only the first 10 characters of each line of `myfile`. --- name: cut_paste2 ## Filters: `cut` and `paste` (2) - `paste` combines lines consisting of the sequentially corresponding lines from different files, separated by TABs, to standard output. - Example: Suppose that there are two files named `cities` and `countries` whose contents are as shown below (in two columns to save space.) .left-column2[ ```bash $ cat cities Rome Paris London Dublin Tokyo ``` ] .right-column2[ ```bash $ cat countries Italy France England Ireland Japan ``` ] .below-column2[ Then the `paste` command will "merge" the files onto standard output, separating the words with tabs: ```bash $ paste cities countries Rome Italy Paris France London England Dublin Ireland Tokyo Japan ``` ] --- name: cut_paste3 ## Filters: `cut` and `paste` (3) - We can feed standard input into the `paste` command simultaneously. A hyphen '-' represents standard input when it is used in place of a filename argument, as in ```bash paste file - ``` We can use this idea to add numbers to the lines in our previous example, e.g. line 1, 2,3, 4, and 5, as follows: ```bash $seq 1 5 | paste - cities countries 1 Rome Italy 2 Paris France 3 London England 4 Dublin Ireland 5 Tokyo Japan ``` - See what happens when you try ```bash $seq 1 30 | paste - - - ``` Pretty interesting? --- name: awk1 ## `awk` - `awk` is not just an extremely powerful filter; _it is a programming language_. The `awk` filter implements the AWK programming language, named after its authors, __A__ho, __K__ernighan, and __W__einberger. - You give `awk` a program and files on which the program is run, and `awk` runs the program on one file after another. The program is either enclosed in single quotes on the command line, or passed to `awk` in a file like so: ```bash awk [AWK-OPTIONS] -f program-file file ... ``` - If the program is enclosed in single quotes on the command-line, then you run `awk` like this: ```bash awk [AWK-OPTIONS] 'program-text' file ... ``` - Unlike the other filters, `awk` treats each input line as a sequence of fields, delimited by an __input field separator__, by default any amount of whitespace. Fields are named `$1`,`$2`, `$3`, and so on. $0 is the entire line. - If there is no file argument, `awk` reads from standard input. - There is a lot to learn about `awk` - these slides contain just some simple examples. --- name: awk2 ## `awk` (2) - Every `awk` program is a sequence of __pattern-action__ instructions or function definitions*. .footnote[ * There are other kinds of statements as well, not covered here. ] - A pattern-action instruction is of the form ```bash pattern {action} ``` where the pattern can be any of - `BEGIN` - `END` - a regular expression - a comparison - empty and the action is an instruction in a mostly C-like syntax. - Example ```bash awk ' $1 == "reboot" {print $2; }' file1 file2 file3 ``` which prints the second field in any line whose first field is the string "reboot" from the input files `file1`, `file2`, and `file3`. --- name: awk3 ## `awk` (3) - The input field separator can be a character or a regular expression. The command-line option `-F` sets it; use single quotes to protect regular expressions from the shell: ```bash awk -F: '{print $3}' /etc/passwd ``` which separates fields with the colon ":", and ```bash awk -F'aa*' '{print $3}' file1 ``` which separates fields with one or more `a`'s. - Simples uses of `awk` are to print lines with fields that meet a condition, or to print the fields in a different order. `awk` has both a simple `print` instruction as well as all forms of the `C` `printf`: ```bash awk -F, ' {print $1, $2}' names.csv ``` prints the first two fields of every line with a space between them, whereas ```bash awk -F, '{printf "%s\t%s\n", $1, $2}' names.csv ``` prints the first two fields with a TAB between them and a newline after. --- name: awk4 ## `awk` (4) - The __`BEGIN`__ pattern causes `awk` to execute the associated action __before__ reading its input. The __`END`__ pattern's action is executed __after__ all input has been read. The following adds the values of field $1 on all lines and prints their sum. ```bash awk ' BEGIN { sum = 0 } { sum += $1 } END { print sum }' ``` - `awk` has variables that do not need declarations, and - `awk` has operators just like those in C. - The `awk` built-in variable __`NR`__ is the total number of records (lines) read so far. __`NF`__ is the number of fields on the current line, so this command prints the average of the field $1 values: ```bash awk ' BEGIN { sum = 0 } { sum += $1 } END { if ( NR > 0 ) { print sum/NR } }' ``` - This ```bash awk ' { printf "Line %d has %d fields.\n", NR, NF }' ``` displays for each input line, a line of the form ```bash Line 1 has 10 fields. ... ``` --- name: awk5 ## `awk` (5) - The variable `$NF` is the last field on a line, so ```bash awk '{ print $NF }' ``` prints the last field on every input line. - The variable `FNR` is the input record number __in the current input file__. We can print the headings of a CSV file using the following `awk` script: ```bash cat somecsvfile.csv | awk -F, ' \ { if (FNR==1) \ for (i=1; i<=NF; i++) \ printf "Field %d:\t%s\n",i,$i \ }' ``` The backslashes are needed to prevent the shell from treating each line as a separate command. Notice that `$i` is used to iterate through `$1`, `$2`,... `$NF`. --- name: awk6 ## `awk` (6) - You can use extended regular expression matching in the pattern: ```bash awk -F, ' BEGIN {sum = 0} $1 ~ /[AB].*/ { sum += $3 } END {print sum}' names.csv ``` which adds the values from field $3 for all input lines whose first field starts with an "A" or a "B" from the csv file `names.csv`. All extended regular expressions can be used in `awk`. They must be enclosed in `/ /` brackets. - Logical expressions can be patterns. This `awk` script finds the smallest unused user id in the password file to assign to a new user: ```bash ypcat passwd | \ awk 'BEGIN {FS = ":" ; MAX = 0 } ($3 > MAX ) {MAX = $3} \ END {printf " %s\n", MAX+1} ' ``` The baskslashes are used at the ends of lines that are part of the command. And `FS` is the field separator variable. This is another way to dynamically change it. --- name: awk7 ## `awk` (7) - `awk` has the following control flow statements: - if (condition) statement [ else statement ] - while (condition) statement - do statement while (condition) - for (expr1; expr2; expr3) statement - for (var in array) statement - break - continue - delete array[index] - delete array - exit [ expression ] - { statements } - a switch statement like C's. - `awk` also has many built-in functions, including numeric functions, string functions, time functions, bit manipulationm and more. - It has array variables as well. - There is much more to `awk` than can be described in a few slides. The man page is a good place to look for a comprehensive description of what it can do. --- name: more_filters ## Filters yet to come: - Some filters have not yet made it into these slides. The most important of these are - sed, shuf, split, tr Of these, `sed` is the most powerful, and the hardest to master. The others have a shallow learning curve, and you can read the manpages for them to figure out how to use them. --- name: links ## Useful Links A list of some relevant links - [List of Common Linux Commands](https://ss64.com/bash/) - [Linux Scripting Tutorial](https://bash.cyberciti.biz/guide/Main_Page) - [GNU bash Manual](https://www.gnu.org/software/bash/manual/bash.html) - [Introduction to Text Manipulation in Unix](https://developer.ibm.com/articles/au-unixtext/#25.Resources|outline) --- name: exercise1 ## Exercises 1 Start with some easier ones first. In the exercises, the word _command_ means a structured command - you might need to use pipes or even nested commands. 1. Look at the man page for the `shuf` command. Then write a command that generates a permutation of the integers from 1 to 100 in a file named `permutation100`. 1. Write a command that puts the absolute path names to all C or C++ files in your home directory or below it in a file named `mysourcecode` in your home directory. These files also include header files with a suffix of `.h`. 1. Write a command to display the first 10 lines of every file in the current working directory.For simplicity, assume that the directory contains only plain text files. --- name: exercise2 ## Exercises 2 1. Write a command that prints the number of times that the word "lie" occurs in the set of all files in the current directory with a `.html extension`. Make it case insensitive. 1. This one is not hard after you do a bit of rummaging through the man page for `grep`. Write a command that will print every line of a file preceded by its line number followed by a colon. For example, if the file has two lines ```bash All that glitters is not gold. ``` then it will display ```bash 1: All that glitters 2: is not gold. ``` --- name: exercise3 ## Exercises 3 1. The `ps` command displays process status information for a set of processes running on the local machine. With the `-ef` flags, `ps` lists the status of every process. Look at its output and then write a __script__ named `showprocs` that, when given a user's name, prints the number of processes currently running on that user's behalf. Although the slides so far have not shown the form of a script, you can model it from the following: ```bash #!/bin/bash # Print the first command line argument and exit echo $1 $1 $1 ``` The first line is required in its exact form. The remaining lines that start with `#` are comment lines; you put them there as documentation. The `$1` is a shell variable that stores the first word on the command line after the command itself. `$1` is replaced by the word typed after the script's name when the command is executed. For example if the above script is in a file named `echo3` then we would see the following: ``` $ echo3 hello hello hello hello ``` if we first make the script executable by typing `chmod +x echo3`. --- name: exercise4 ## Exercises 4 1. The `history` command in bash displays the commands you have run recently. The file `~/.bash_history` stores by default the last 500 commands. Inspect that file and then write a command that displays the ten commands you have used the most recently. 1. Before changing a file, it is sometimes safe to make a copy of it and tag the copy with today's date, as for example, by changing `README.md` to `README.md.2019.11.10`. Write a script named `cpdated` that could be used to make a copy with the current date appended to the name, so that ```bash cpdated myfile ``` would create `myfile.2019.11.10` if run on November 10, 2019. Don't worry about error checking such as whether the file exists or whether you have permission to create a file in the current directory. --- name: exercise5 ## Exercises 5 1. Suppose that you want to drop down the headings by one level each in a set of markdown files whose names end in a `.md` extension, in a directory named `documents`. For example, you want a heading starting with `#` to become a `##` heading, and a `##` heading to become a `###` heading. But you do not want level 4 headings to change, so `####` stays as `####`. You are not sure whether there are spaces after the heading tag before the actual text. You can do any of the following: 1. Open a graphical editor and use its __find/replace__ feature on each and every file. 1. Open a command-line editor like `vi` or `vim` and use its find functionality to find every occurrence and change it. 1. Use `vi` or `vim` to do a __global substitution__. 1. Use a well-designed `sed` command to do all of the replacements in a single shot. The last alternative is clearly the best use of your time, and you can ask `sed` to make a backup in case you are nervous about ruining your files. What is the `vi` substitution that will do this? What is the `sed` command that can do this? --- name: exercise6 ## Exercises 6 1. Suppose that for stylistic reasons, you need to replace every C-style single-line comment in your C++ file by C++ style comments. For example, you need to replace ```bash /* The following code finds the min element in the array */ ``` by ```bash // The following code finds the min element in the array ``` regardless of whether there is code to the left of the comment. You can do any of the following: 1. Open a graphical editor and use its __find/replace__ feature. 1. Open a command-line editor like `vi` or `vim` and use its find functionality to find every occurrence and change it. 1. Use `vi` or `vim` to do a __global substitution__. 1. Use a well-designed `sed` command to do all of the replacements in a single shot. The last alternative is clearly the best use of your time, and you can ask `sed` to make a backup in case you are nervous about ruining your files. What is the `vi` substitution that will do this? What is the `sed` command that can do this? --- name: exercise7 ## Exercises 7 1. (HARD) There is a command called `cal` that displays a calendar in Linux: ```bash $ cal April 2019 Su Mo Tu We Th Fr Sa 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 ``` Using only `echo` and `paste`, try to display the calendar in a similar format for the current month. Hint: if you enclose a sequence of commands in parentheses, they are treated as a single command, e.g.: ```bash $ (echo hello; echo goodbye) | wc 2 2 14 $ echo hello; echo goodbye | wc hello 1 1 8 ```
\([^<]*\)<.*/[\2](\1)\n/p' opensource_links.php ``` --- name: cut_paste1 ## Filters: `cut` and `paste` - Both `cut` and `paste` are handy. `cut` can be used to cut lines in specific places, whether at character positions, or by field positions, and can output the cut pieces with different output delimiters. You can specify what delimits the fields. - The default field separator is the TAB character. - You can use `cut` on csv files to pick out columns: - Examples ```bash cut -f1,5 -d: /etc/passwd ``` prints the first and fifth fields of the `/etc/password` file, i.e., the username and "gcos" field. ```bash cut –c1-10 myfile ``` prints only the first 10 characters of each line of `myfile`. --- name: cut_paste2 ## Filters: `cut` and `paste` (2) - `paste` combines lines consisting of the sequentially corresponding lines from different files, separated by TABs, to standard output. - Example: Suppose that there are two files named `cities` and `countries` whose contents are as shown below (in two columns to save space.) .left-column2[ ```bash $ cat cities Rome Paris London Dublin Tokyo ``` ] .right-column2[ ```bash $ cat countries Italy France England Ireland Japan ``` ] .below-column2[ Then the `paste` command will "merge" the files onto standard output, separating the words with tabs: ```bash $ paste cities countries Rome Italy Paris France London England Dublin Ireland Tokyo Japan ``` ] --- name: cut_paste3 ## Filters: `cut` and `paste` (3) - We can feed standard input into the `paste` command simultaneously. A hyphen '-' represents standard input when it is used in place of a filename argument, as in ```bash paste file - ``` We can use this idea to add numbers to the lines in our previous example, e.g. line 1, 2,3, 4, and 5, as follows: ```bash $seq 1 5 | paste - cities countries 1 Rome Italy 2 Paris France 3 London England 4 Dublin Ireland 5 Tokyo Japan ``` - See what happens when you try ```bash $seq 1 30 | paste - - - ``` Pretty interesting? --- name: awk1 ## `awk` - `awk` is not just an extremely powerful filter; _it is a programming language_. The `awk` filter implements the AWK programming language, named after its authors, __A__ho, __K__ernighan, and __W__einberger. - You give `awk` a program and files on which the program is run, and `awk` runs the program on one file after another. The program is either enclosed in single quotes on the command line, or passed to `awk` in a file like so: ```bash awk [AWK-OPTIONS] -f program-file file ... ``` - If the program is enclosed in single quotes on the command-line, then you run `awk` like this: ```bash awk [AWK-OPTIONS] 'program-text' file ... ``` - Unlike the other filters, `awk` treats each input line as a sequence of fields, delimited by an __input field separator__, by default any amount of whitespace. Fields are named `$1`,`$2`, `$3`, and so on. $0 is the entire line. - If there is no file argument, `awk` reads from standard input. - There is a lot to learn about `awk` - these slides contain just some simple examples. --- name: awk2 ## `awk` (2) - Every `awk` program is a sequence of __pattern-action__ instructions or function definitions*. .footnote[ * There are other kinds of statements as well, not covered here. ] - A pattern-action instruction is of the form ```bash pattern {action} ``` where the pattern can be any of - `BEGIN` - `END` - a regular expression - a comparison - empty and the action is an instruction in a mostly C-like syntax. - Example ```bash awk ' $1 == "reboot" {print $2; }' file1 file2 file3 ``` which prints the second field in any line whose first field is the string "reboot" from the input files `file1`, `file2`, and `file3`. --- name: awk3 ## `awk` (3) - The input field separator can be a character or a regular expression. The command-line option `-F` sets it; use single quotes to protect regular expressions from the shell: ```bash awk -F: '{print $3}' /etc/passwd ``` which separates fields with the colon ":", and ```bash awk -F'aa*' '{print $3}' file1 ``` which separates fields with one or more `a`'s. - Simples uses of `awk` are to print lines with fields that meet a condition, or to print the fields in a different order. `awk` has both a simple `print` instruction as well as all forms of the `C` `printf`: ```bash awk -F, ' {print $1, $2}' names.csv ``` prints the first two fields of every line with a space between them, whereas ```bash awk -F, '{printf "%s\t%s\n", $1, $2}' names.csv ``` prints the first two fields with a TAB between them and a newline after. --- name: awk4 ## `awk` (4) - The __`BEGIN`__ pattern causes `awk` to execute the associated action __before__ reading its input. The __`END`__ pattern's action is executed __after__ all input has been read. The following adds the values of field $1 on all lines and prints their sum. ```bash awk ' BEGIN { sum = 0 } { sum += $1 } END { print sum }' ``` - `awk` has variables that do not need declarations, and - `awk` has operators just like those in C. - The `awk` built-in variable __`NR`__ is the total number of records (lines) read so far. __`NF`__ is the number of fields on the current line, so this command prints the average of the field $1 values: ```bash awk ' BEGIN { sum = 0 } { sum += $1 } END { if ( NR > 0 ) { print sum/NR } }' ``` - This ```bash awk ' { printf "Line %d has %d fields.\n", NR, NF }' ``` displays for each input line, a line of the form ```bash Line 1 has 10 fields. ... ``` --- name: awk5 ## `awk` (5) - The variable `$NF` is the last field on a line, so ```bash awk '{ print $NF }' ``` prints the last field on every input line. - The variable `FNR` is the input record number __in the current input file__. We can print the headings of a CSV file using the following `awk` script: ```bash cat somecsvfile.csv | awk -F, ' \ { if (FNR==1) \ for (i=1; i<=NF; i++) \ printf "Field %d:\t%s\n",i,$i \ }' ``` The backslashes are needed to prevent the shell from treating each line as a separate command. Notice that `$i` is used to iterate through `$1`, `$2`,... `$NF`. --- name: awk6 ## `awk` (6) - You can use extended regular expression matching in the pattern: ```bash awk -F, ' BEGIN {sum = 0} $1 ~ /[AB].*/ { sum += $3 } END {print sum}' names.csv ``` which adds the values from field $3 for all input lines whose first field starts with an "A" or a "B" from the csv file `names.csv`. All extended regular expressions can be used in `awk`. They must be enclosed in `/ /` brackets. - Logical expressions can be patterns. This `awk` script finds the smallest unused user id in the password file to assign to a new user: ```bash ypcat passwd | \ awk 'BEGIN {FS = ":" ; MAX = 0 } ($3 > MAX ) {MAX = $3} \ END {printf " %s\n", MAX+1} ' ``` The baskslashes are used at the ends of lines that are part of the command. And `FS` is the field separator variable. This is another way to dynamically change it. --- name: awk7 ## `awk` (7) - `awk` has the following control flow statements: - if (condition) statement [ else statement ] - while (condition) statement - do statement while (condition) - for (expr1; expr2; expr3) statement - for (var in array) statement - break - continue - delete array[index] - delete array - exit [ expression ] - { statements } - a switch statement like C's. - `awk` also has many built-in functions, including numeric functions, string functions, time functions, bit manipulationm and more. - It has array variables as well. - There is much more to `awk` than can be described in a few slides. The man page is a good place to look for a comprehensive description of what it can do. --- name: more_filters ## Filters yet to come: - Some filters have not yet made it into these slides. The most important of these are - sed, shuf, split, tr Of these, `sed` is the most powerful, and the hardest to master. The others have a shallow learning curve, and you can read the manpages for them to figure out how to use them. --- name: links ## Useful Links A list of some relevant links - [List of Common Linux Commands](https://ss64.com/bash/) - [Linux Scripting Tutorial](https://bash.cyberciti.biz/guide/Main_Page) - [GNU bash Manual](https://www.gnu.org/software/bash/manual/bash.html) - [Introduction to Text Manipulation in Unix](https://developer.ibm.com/articles/au-unixtext/#25.Resources|outline) --- name: exercise1 ## Exercises 1 Start with some easier ones first. In the exercises, the word _command_ means a structured command - you might need to use pipes or even nested commands. 1. Look at the man page for the `shuf` command. Then write a command that generates a permutation of the integers from 1 to 100 in a file named `permutation100`. 1. Write a command that puts the absolute path names to all C or C++ files in your home directory or below it in a file named `mysourcecode` in your home directory. These files also include header files with a suffix of `.h`. 1. Write a command to display the first 10 lines of every file in the current working directory.For simplicity, assume that the directory contains only plain text files. --- name: exercise2 ## Exercises 2 1. Write a command that prints the number of times that the word "lie" occurs in the set of all files in the current directory with a `.html extension`. Make it case insensitive. 1. This one is not hard after you do a bit of rummaging through the man page for `grep`. Write a command that will print every line of a file preceded by its line number followed by a colon. For example, if the file has two lines ```bash All that glitters is not gold. ``` then it will display ```bash 1: All that glitters 2: is not gold. ``` --- name: exercise3 ## Exercises 3 1. The `ps` command displays process status information for a set of processes running on the local machine. With the `-ef` flags, `ps` lists the status of every process. Look at its output and then write a __script__ named `showprocs` that, when given a user's name, prints the number of processes currently running on that user's behalf. Although the slides so far have not shown the form of a script, you can model it from the following: ```bash #!/bin/bash # Print the first command line argument and exit echo $1 $1 $1 ``` The first line is required in its exact form. The remaining lines that start with `#` are comment lines; you put them there as documentation. The `$1` is a shell variable that stores the first word on the command line after the command itself. `$1` is replaced by the word typed after the script's name when the command is executed. For example if the above script is in a file named `echo3` then we would see the following: ``` $ echo3 hello hello hello hello ``` if we first make the script executable by typing `chmod +x echo3`. --- name: exercise4 ## Exercises 4 1. The `history` command in bash displays the commands you have run recently. The file `~/.bash_history` stores by default the last 500 commands. Inspect that file and then write a command that displays the ten commands you have used the most recently. 1. Before changing a file, it is sometimes safe to make a copy of it and tag the copy with today's date, as for example, by changing `README.md` to `README.md.2019.11.10`. Write a script named `cpdated` that could be used to make a copy with the current date appended to the name, so that ```bash cpdated myfile ``` would create `myfile.2019.11.10` if run on November 10, 2019. Don't worry about error checking such as whether the file exists or whether you have permission to create a file in the current directory. --- name: exercise5 ## Exercises 5 1. Suppose that you want to drop down the headings by one level each in a set of markdown files whose names end in a `.md` extension, in a directory named `documents`. For example, you want a heading starting with `#` to become a `##` heading, and a `##` heading to become a `###` heading. But you do not want level 4 headings to change, so `####` stays as `####`. You are not sure whether there are spaces after the heading tag before the actual text. You can do any of the following: 1. Open a graphical editor and use its __find/replace__ feature on each and every file. 1. Open a command-line editor like `vi` or `vim` and use its find functionality to find every occurrence and change it. 1. Use `vi` or `vim` to do a __global substitution__. 1. Use a well-designed `sed` command to do all of the replacements in a single shot. The last alternative is clearly the best use of your time, and you can ask `sed` to make a backup in case you are nervous about ruining your files. What is the `vi` substitution that will do this? What is the `sed` command that can do this? --- name: exercise6 ## Exercises 6 1. Suppose that for stylistic reasons, you need to replace every C-style single-line comment in your C++ file by C++ style comments. For example, you need to replace ```bash /* The following code finds the min element in the array */ ``` by ```bash // The following code finds the min element in the array ``` regardless of whether there is code to the left of the comment. You can do any of the following: 1. Open a graphical editor and use its __find/replace__ feature. 1. Open a command-line editor like `vi` or `vim` and use its find functionality to find every occurrence and change it. 1. Use `vi` or `vim` to do a __global substitution__. 1. Use a well-designed `sed` command to do all of the replacements in a single shot. The last alternative is clearly the best use of your time, and you can ask `sed` to make a backup in case you are nervous about ruining your files. What is the `vi` substitution that will do this? What is the `sed` command that can do this? --- name: exercise7 ## Exercises 7 1. (HARD) There is a command called `cal` that displays a calendar in Linux: ```bash $ cal April 2019 Su Mo Tu We Th Fr Sa 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 ``` Using only `echo` and `paste`, try to display the calendar in a similar format for the current month. Hint: if you enclose a sequence of commands in parentheses, they are treated as a single command, e.g.: ```bash $ (echo hello; echo goodbye) | wc 2 2 14 $ echo hello; echo goodbye | wc hello 1 1 8 ```